28 research outputs found

    Low-power dynamic object detection and classification with freely moving event cameras

    Get PDF
    We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios

    Fast sparse coding for range data denoising with sparse ridges constraint

    Get PDF
    Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency

    Neuromorphic engineering needs closed-loop benchmarks

    Get PDF
    Neuromorphic engineering aims to build (autonomous) systems by mimicking biological systems. It is motivated by the observation that biological organisms—from algae to primates—excel in sensing their environment, reacting promptly to their perils and opportunities. Furthermore, they do so more resiliently than our most advanced machines, at a fraction of the power consumption. It follows that the performance of neuromorphic systems should be evaluated in terms of real-time operation, power consumption, and resiliency to real-world perturbations and noise using task-relevant evaluation metrics. Yet, following in the footsteps of conventional machine learning, most neuromorphic benchmarks rely on recorded datasets that foster sensing accuracy as the primary measure for performance. Sensing accuracy is but an arbitrary proxy for the actual system's goal—taking a good decision in a timely manner. Moreover, static datasets hinder our ability to study and compare closed-loop sensing and control strategies that are central to survival for biological organisms. This article makes the case for a renewed focus on closed-loop benchmarks involving real-world tasks. Such benchmarks will be crucial in developing and progressing neuromorphic Intelligence. The shift towards dynamic real-world benchmarking tasks should usher in richer, more resilient, and robust artificially intelligent systems in the future

    Boosted kernelized correlation filters for event-based face detection

    No full text
    Recently, deep learning has revolutionized the computer vision field and has resulted in steep advances in the performance of vision systems for human detection and classification on large datasets. Nevertheless, these systems rely on static cameras that do not yield practical results, especially for prolonged monitoring periods and when multiple object activities occur simultaneously. We propose that event cam- eras naturally solve these issues at the hardware level via asynchronous, pixel-level brightness sensing at microsecond time-scale. In particular, event cameras do not output data during no-activity periods and thus data rate is drastically lowered without any additional processing. Secondly, event cameras produce disjoint spatial outputs for multiple objects without requiring segmentation or explicit back- ground modeling. Leveraging these attractive properties, this paper presents an event-based feature learning method using kernelized correlation filters (KCF) within a boosting framework. A key contribution is the reformulation of KCFs to learn the face representation instead of relying on hand- crafted feature descriptors as done in previous works. We report a high detection performance on data collected using an event camera and showcase its potential for surveillance applications. For fostering further research, we release the face dataset used in our work to the wider community

    Unseen object categorization using multiple visual cues

    No full text
    In this paper, we propose an object categorization framework to extract different visual cues and tackle the problem of categorizing previously unseen objects under various viewpoints. Specifically, we decompose the input image into three visual cues: structure, texture and shape cues. Then, local features are extracted using the log-polar transform to achieve scale and rotation invariance. The local descriptors obtained from different visual cues are fused using the bag-of-words representation with some key contributions: (1) a keypoint detection scheme based on variational calculus is proposed for selecting sampling locations; (2) a codebook optimization scheme based on discrete entropy is proposed to choose the optimal codewords and at the same time increase the overall performance. We tested the proposed object classification framework on the ETH-80 dataset using the leave-one-object-out protocol to specifically tackle the problem of categorizing previously unseen objects under various viewpoints. On this popular dataset, the proposed object categorization system obtained a very high improvement in classification performance compared to state-of-the-art methods

    Multiple object cues for high performance vector quantization

    No full text
    In this paper, we propose a multi-cue object representation for image classification using the standard bag-of-words model. Ever since the success of the bag-of-words model for image classification, several modifications of it have been proposed in the literature. These variants target to improve key aspects, such as efficient and compact dictionary learning, advanced image encoding techniques, pooling methods, and efficient kernels for the final classification step. In particular, “soft-encoding” methods such as sparse coding, locality constrained linear coding, Fisher vector encoding, have received great attention in the literature, to improve upon the “hard-assignment” obtained by vector quantization. Nevertheless, these methods come at a higher computational cost while little attention has been paid to the extracted local features. In contrast, we propose a novel multi-cue object representation for image classification using the simple vector quantization, and show highly competitive classification performance compared to state-of-the-art methods on popular datasets like Caltech-101 and MICC Flickr-101. Apart from the object representation, we also propose a novel keypoint detection scheme that helps to achieve a classification rate comparable to the popular dense keypoint sampling strategy, at a much lower computational cost

    A trustless federated framework for decentralized and confidential deep learning

    No full text
    Nowadays, deep learning models can be trained on large amounts of web data on power hungry servers and be deployment-ready for specific real-world applications. With a state-of-the-art model architecture and a large publicly available dataset for pre-training, convolutional neural network models can be further fine-tuned via transfer learning for a related task. Nonetheless, the training process required of privacysensitive applications needs to protect data confidentiality and simultaneously boost performance using limited training data. For such data-deprived and privacy-centric learning, we introduce a trustless federated learning framework that seamlessly integrates deep learning models from different edge nodes using a blockchain-based architecture. Our framework performs federated learning without the need of a central server by leveraging a smart contract blockchain platform with a distributed file system for model storage. Users can locally train on their data while routinely benefiting from an enhanced model obtained by merging the models from all users in a decentralized fashion. Most importantly, this framework is free of the potential single point of failure in centralized federated learning. Besides, our framework has in-built incentive mechanisms to prevent model corruption and temper bad actors. We tested our framework on various computer vision datasets. The experimental results show that the merged model accuracy is on-par compared to a centralized federated training setup. To the best of our knowledge, this work represents the first systematic attempt at building a blockchain-based federated deep learning framework for computer vision. The code is publicly made available at: https://github.com/s-elo/DNN-Blockchain

    Vehicle detection in remote sensing images leveraging on simultaneous super-resolution

    No full text
    Owing to the relatively small size of vehicles in remote sensing images, lacking sufficient detailed appearance to distinguish vehicles from similar objects, the detection performance is still far from satisfactory compared with the detection results on everyday images. Inspired by the positive effects of super-resolution convolutional neural network (SRCNN) for object detection and the stunning success of deep CNN techniques, we apply generative adversarial network frameworks to realize simultaneous SRCNN and vehicle detection in an end-to-end manner, and the detection loss is backpropagated into the SRCNN during training to facilitate detection. In particular, our work is unsupervised and bypasses the requirement of low-/high-resolution image pairs during the training stage, achieving increased generality and applicability. Extensive experiments on representative data sets demonstrate that our method outperforms the state-of-the-art detectors. (The source code will be made available after the review process)

    Shape classification using invariant features and contextual information in the bag-of-words model

    No full text
    In this paper, we describe a classification framework for binary shapes that have scale, rotation and strong viewpoint variations. To this end, we develop several novel techniques. First, we employ the spectral magnitude of log-polar transform as a local feature in the bag-of-words model. Second, we incorporate contextual information in the bag-of-words model using a novel method to extract bi-grams from the spatial co-occurrence matrix. Third, a novel metric termed ‘weighted gain ratio’ is proposed to select a suitable codebook size in the bag-of-words model. The proposed metric is generic, and hence it can be used for any clustering quality evaluation task. Fourth, a joint learning framework is proposed to learn features in a data-driven manner, and thus avoid manual fine-tuning of the model parameters. We test our shape classification system on the animal shapes dataset and significantly outperform state-of-the-art methods in the literature

    SOFEA : a non-iterative and robust optical flow estimation algorithm for dynamic vision sensors

    No full text
    We introduce the single-shot optical flow estimation algorithm (SOFEA) to non-iteratively compute the continuous-time flow information of events produced from bio-inspired cameras such as the dynamic vision sensor (DVS). The output of a DVS is a stream of asynchronous spikes (“events”), transmitted at very minimal latency (1- 10 μs), caused by local brightness changes. Due to this unconventional output, a continuous representation of events over time is invaluable to most applications using the DVS. To this end, SOFEA consolidates the spatio-temporal information on the surface of active events for flow estimation in a single-shot manner, as opposed to iterative methods in the literature. In contrast to previous works, this is also the first principled method towards finding locally optimal set of neighboring events for plane fitting using an adaptation of Prim’s algorithm. Consequently, SOFEA produces flow estimates that are more accurate across a wide variety of scenes compared to state-of-the-art methods. A direct application of such flow estimation is rendering sharp event images using the set of active events at a given time, which is further demonstrated and compared to existing works (source code will be made available at our homepage after the review process)
    corecore